From ee6ddeab87b78e63d2d2e6a59a0b7a44e68d687c Mon Sep 17 00:00:00 2001 From: Kshitiz Sharma Date: Fri, 9 Jan 2026 11:47:22 +0530 Subject: [PATCH 1/9] Feature-35022 --- src/powershell/tests/Test-Assessment.35022.md | 18 +++ .../tests/Test-Assessment.35022.ps1 | 126 ++++++++++++++++++ 2 files changed, 144 insertions(+) create mode 100644 src/powershell/tests/Test-Assessment.35022.md create mode 100644 src/powershell/tests/Test-Assessment.35022.ps1 diff --git a/src/powershell/tests/Test-Assessment.35022.md b/src/powershell/tests/Test-Assessment.35022.md new file mode 100644 index 000000000..1eaa3045f --- /dev/null +++ b/src/powershell/tests/Test-Assessment.35022.md @@ -0,0 +1,18 @@ +Organizations with large volumes of historical content in SharePoint, OneDrive, and Exchange that predates auto-labeling policy implementation lack visibility into the extent of unclassified sensitive data across their tenants. Auto-labeling policies only classify new and modified content going forward; existing files and emails remain unclassified and invisible to data loss prevention policies that depend on label detection. Without on-demand scans, organizations cannot perform a baseline assessment of sensitive information already present in their environments, making it impossible to quantify compliance risk, plan remediation, or validate that DLP controls are effectively protecting all sensitive data. On-demand scans allow organizations to manually trigger sensitive information type detection across specified SharePoint sites, OneDrive accounts, and Exchange mailboxes, identifying where sensitive data exists and enabling targeted classification through retroactive labeling. Configuring at least one on-demand scan enables organizations to discover and classify historical sensitive data, providing a comprehensive view of their information protection posture beyond the forward-looking coverage of auto-labeling policies and creating a complete baseline for compliance and risk management. + +**Remediation action** + +To enable the default sensitivity label capability for SharePoint document libraries: +1. **Plan your scan strategy** by identifying locations with historical sensitive data (finance, HR, legal departments) that predate auto-labeling policies. +2. **Access the scan creation interface** in the Microsoft Purview Portal: Information Protection > Classifiers > On-demand classification OR Data Loss Prevention > Classifiers > On-demand classification. +3. **Select target locations** (specific SharePoint sites, OneDrive accounts, and/or Exchange mailboxes) and **choose sensitive information types to detect** (credit card numbers, SSNs, healthcare identifiers, trade secrets). +4. **Configure scan settings** including confidence thresholds (lower = more matches but higher false positives; higher = fewer false positives but may miss data) and file type filters. For trainable classifiers, ensure high-quality training data. +5. **Schedule or run the scan** immediately for baseline scans or set recurring schedules (daily/weekly/monthly). Note: Large scans can take days or weeks depending on data volume and may impact resource utilization. +6. **Monitor progress and analyze results** by tracking completion in the Microsoft Purview Portal. Upon completion, identify sensitive data locations, review prevalence by type, and determine remediation actions. + +- [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) +- [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) +- [Sensitive information types entity reference](https://learn.microsoft.com/en-us/purview/sensitive-information-type-entity-definitions) + + +%TestResult% diff --git a/src/powershell/tests/Test-Assessment.35022.ps1 b/src/powershell/tests/Test-Assessment.35022.ps1 new file mode 100644 index 000000000..1b16daaa9 --- /dev/null +++ b/src/powershell/tests/Test-Assessment.35022.ps1 @@ -0,0 +1,126 @@ +<# +.SYNOPSIS + On-Demand Scans Configured for Sensitive Information Discovery + +.DESCRIPTION + On-demand scans enable organizations to discover sensitive information in historical + SharePoint, OneDrive, and Exchange content that predates auto-labeling policies. + If no on-demand scans are configured, organizations lack visibility into existing + sensitive data and cannot establish a compliance baseline. + +.NOTES + Test ID: 35022 + Pillar: Data + Risk Level: Medium +#> + +function Test-Assessment-350022 { + [ZtTest( + Category = 'Information Protection', + ImplementationCost = 'Medium', + MinimumLicense = ('Microsoft 365 E5'), + Pillar = 'Data', + RiskLevel = 'Medium', + SfiPillar = 'Protect tenants and production systems', + TenantType = ('Workforce'), + TestId = 35022, + Title = 'On-Demand Scans Configured for Sensitive Information Discovery', + UserImpact = 'Low' + )] + [CmdletBinding()] + param() + + #region Data Collection + Write-PSFMessage '🟦 Start' -Tag Test -Level VeryVerbose + Write-ZtProgress -Activity 'Checking On-Demand Sensitive Information Scans' + + $scans = $null + $errorMsg = $null + $customStatus = $null + + try { + $scans = Get-SensitiveInformationScan -ErrorAction Stop + } + catch { + $errorMsg = $_.Exception.Message + Write-PSFMessage "Error retrieving on-demand scans: $errorMsg" -Level Error + } + #endregion Data Collection + + #region Assessment Logic + if ($errorMsg) { + $passed = $false + $customStatus = 'Investigate' + } + elseif ($scans -and $scans.Count -gt 0) { + $passed = $true + } + else { + $passed = $false + } + #endregion Assessment Logic + + #region Data Processing & Report Generation + if ($errorMsg) { + $testResultMarkdown = "### Investigate`n`n" + $testResultMarkdown += "Unable to retrieve on-demand scan configuration due to an error:`n`n" + $testResultMarkdown += $errorMsg + } + elseif (-not $passed) { + $testResultMarkdown = "❌ No on-demand scans are configured. Historical sensitive data cannot be discovered.`n" + } + else { + $testResultMarkdown = "### On-Demand Sensitive Information Discovery Summary`n`n" + $testResultMarkdown += "Total Scans Configured: **$($scans.Count)**`n`n" + + # Define Table Header + $testResultMarkdown += "| Scan name | Status | Workload | Last run | Sensitive info types covered |`n" + $testResultMarkdown += "|-----------|--------|----------|----------|------------------------------|`n" + + foreach ($scan in $scans) { + # 1. Retrieve the matching Rule to find SIT details + # We use SilentlyContinue because a broken/orphan scan might lack a rule + $rule = Get-SensitiveInformationScanRule -Policy $scan.Name -ErrorAction SilentlyContinue + + # 2. Extract SIT Names using the discovered property path + $sitNamesList = @() + if ($rule -and $rule.ContentContainsSensitiveInformation -and $rule.ContentContainsSensitiveInformation.groups -and $rule.ContentContainsSensitiveInformation.groups.sensitivetypes) { + $types = $rule.ContentContainsSensitiveInformation.groups.sensitivetypes + foreach ($t in $types) { + if ($t.Name) { $sitNamesList += $t.Name } + } + } + + # Fallback if list is empty but rule exists (uncommon, but handles potential unexpected structures) + if ($sitNamesList.Count -eq 0) { + if ($rule) { $sitNamesList += "All/None Specific" } + else { $sitNamesList += "Rule Not Found" } + } + + $sitString = $sitNamesList -join ", " + + # 3. Format Other Columns + $scanName = $scan.Name + $status = $scan.SensitiveInformationScanStatus + $workload = if ($scan.Workload) { $scan.Workload -replace ",", ", " } else { "None" } + $lastRun = if ($scan.LastImpactAssessmentStartTime) { $scan.LastImpactAssessmentStartTime.ToString("yyyy-MM-dd") } else { "Never" } + + # 4. Append Row to Markdown Table + $testResultMarkdown += "| $scanName | $status | $workload | $lastRun | $sitString |`n" + } + } + #endregion Data Processing & Report Generation + + $testResultDetail = @{ + TestId = '35022' + Title = 'On-Demand Scans Configured for Sensitive Information Discovery' + Status = $passed + Result = $testResultMarkdown + } + + if ($customStatus) { + $testResultDetail.CustomStatus = $customStatus + } + + Add-ZtTestResultDetail @testResultDetail +} From 5e939076ced0d4191fb1fa764a9d39f81de96d85 Mon Sep 17 00:00:00 2001 From: Kshitiz Sharma Date: Fri, 9 Jan 2026 12:40:58 +0530 Subject: [PATCH 2/9] Feature-35022 : function name fix --- src/powershell/tests/Test-Assessment.35022.ps1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/powershell/tests/Test-Assessment.35022.ps1 b/src/powershell/tests/Test-Assessment.35022.ps1 index 1b16daaa9..62cc40644 100644 --- a/src/powershell/tests/Test-Assessment.35022.ps1 +++ b/src/powershell/tests/Test-Assessment.35022.ps1 @@ -14,7 +14,7 @@ Risk Level: Medium #> -function Test-Assessment-350022 { +function Test-Assessment-35022 { [ZtTest( Category = 'Information Protection', ImplementationCost = 'Medium', From 45715114661fa6def3efe9159fc722db1af50b03 Mon Sep 17 00:00:00 2001 From: Kshitiz Sharma Date: Fri, 9 Jan 2026 13:54:11 +0530 Subject: [PATCH 3/9] md file fix --- src/powershell/tests/Test-Assessment.35022.md | 1 - 1 file changed, 1 deletion(-) diff --git a/src/powershell/tests/Test-Assessment.35022.md b/src/powershell/tests/Test-Assessment.35022.md index 1eaa3045f..f44aadb71 100644 --- a/src/powershell/tests/Test-Assessment.35022.md +++ b/src/powershell/tests/Test-Assessment.35022.md @@ -10,7 +10,6 @@ To enable the default sensitivity label capability for SharePoint document libra 5. **Schedule or run the scan** immediately for baseline scans or set recurring schedules (daily/weekly/monthly). Note: Large scans can take days or weeks depending on data volume and may impact resource utilization. 6. **Monitor progress and analyze results** by tracking completion in the Microsoft Purview Portal. Upon completion, identify sensitive data locations, review prevalence by type, and determine remediation actions. -- [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) - [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) - [Sensitive information types entity reference](https://learn.microsoft.com/en-us/purview/sensitive-information-type-entity-definitions) From 41d73460cbd8a88fb94143f7e8711f04febf9377 Mon Sep 17 00:00:00 2001 From: Kshitiz Sharma Date: Fri, 9 Jan 2026 13:56:39 +0530 Subject: [PATCH 4/9] md file fix --- src/powershell/tests/Test-Assessment.35022.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/powershell/tests/Test-Assessment.35022.md b/src/powershell/tests/Test-Assessment.35022.md index f44aadb71..c640db646 100644 --- a/src/powershell/tests/Test-Assessment.35022.md +++ b/src/powershell/tests/Test-Assessment.35022.md @@ -2,7 +2,7 @@ Organizations with large volumes of historical content in SharePoint, OneDrive, **Remediation action** -To enable the default sensitivity label capability for SharePoint document libraries: +To configure on-demand scans for sensitive information discovery and classification, follow these steps: 1. **Plan your scan strategy** by identifying locations with historical sensitive data (finance, HR, legal departments) that predate auto-labeling policies. 2. **Access the scan creation interface** in the Microsoft Purview Portal: Information Protection > Classifiers > On-demand classification OR Data Loss Prevention > Classifiers > On-demand classification. 3. **Select target locations** (specific SharePoint sites, OneDrive accounts, and/or Exchange mailboxes) and **choose sensitive information types to detect** (credit card numbers, SSNs, healthcare identifiers, trade secrets). From 5bf589c7185245991f884b7d6b9664d9988dc21b Mon Sep 17 00:00:00 2001 From: Kshitiz sharma Date: Tue, 27 Jan 2026 13:06:31 +0530 Subject: [PATCH 5/9] Set of code change --- .../tests/Test-Assessment.35022.ps1 | 169 +++++++----------- 1 file changed, 68 insertions(+), 101 deletions(-) diff --git a/src/powershell/tests/Test-Assessment.35022.ps1 b/src/powershell/tests/Test-Assessment.35022.ps1 index 62cc40644..d89535fe5 100644 --- a/src/powershell/tests/Test-Assessment.35022.ps1 +++ b/src/powershell/tests/Test-Assessment.35022.ps1 @@ -3,124 +3,91 @@ On-Demand Scans Configured for Sensitive Information Discovery .DESCRIPTION - On-demand scans enable organizations to discover sensitive information in historical - SharePoint, OneDrive, and Exchange content that predates auto-labeling policies. - If no on-demand scans are configured, organizations lack visibility into existing - sensitive data and cannot establish a compliance baseline. + Checks if on-demand scans are configured for sensitive information discovery in SharePoint, OneDrive, and Exchange. + Ref: https://learn.microsoft.com/en-us/purview/on-demand-classification .NOTES Test ID: 35022 Pillar: Data Risk Level: Medium + User Impact: Low + Implementation Cost: Medium #> - -function Test-Assessment-35022 { - [ZtTest( - Category = 'Information Protection', - ImplementationCost = 'Medium', - MinimumLicense = ('Microsoft 365 E5'), - Pillar = 'Data', - RiskLevel = 'Medium', - SfiPillar = 'Protect tenants and production systems', - TenantType = ('Workforce'), - TestId = 35022, - Title = 'On-Demand Scans Configured for Sensitive Information Discovery', - UserImpact = 'Low' - )] +function Test-Assessment35022 { [CmdletBinding()] - param() - - #region Data Collection - Write-PSFMessage '🟦 Start' -Tag Test -Level VeryVerbose - Write-ZtProgress -Activity 'Checking On-Demand Sensitive Information Scans' + param( + [Parameter(Mandatory = $false)] + [Object]$Context + ) - $scans = $null - $errorMsg = $null - $customStatus = $null + $TestId = "35022" + $Result = "Investigate" + $Message = "" + $Details = @() try { - $scans = Get-SensitiveInformationScan -ErrorAction Stop - } - catch { - $errorMsg = $_.Exception.Message - Write-PSFMessage "Error retrieving on-demand scans: $errorMsg" -Level Error - } - #endregion Data Collection - - #region Assessment Logic - if ($errorMsg) { - $passed = $false - $customStatus = 'Investigate' - } - elseif ($scans -and $scans.Count -gt 0) { - $passed = $true - } - else { - $passed = $false - } - #endregion Assessment Logic - - #region Data Processing & Report Generation - if ($errorMsg) { - $testResultMarkdown = "### Investigate`n`n" - $testResultMarkdown += "Unable to retrieve on-demand scan configuration due to an error:`n`n" - $testResultMarkdown += $errorMsg - } - elseif (-not $passed) { - $testResultMarkdown = "❌ No on-demand scans are configured. Historical sensitive data cannot be discovered.`n" - } - else { - $testResultMarkdown = "### On-Demand Sensitive Information Discovery Summary`n`n" - $testResultMarkdown += "Total Scans Configured: **$($scans.Count)**`n`n" + # Check prerequisites + if (-not (Get-Command Get-SensitiveInformationScan -ErrorAction SilentlyContinue)) { + throw "Command 'Get-SensitiveInformationScan' not found. Ensure ExchangeOnlineManagement module is installed and connected." + } - # Define Table Header - $testResultMarkdown += "| Scan name | Status | Workload | Last run | Sensitive info types covered |`n" - $testResultMarkdown += "|-----------|--------|----------|----------|------------------------------|`n" + # Query 1: Get all on-demand scans + $ScansList = Get-SensitiveInformationScan -ErrorAction Stop - foreach ($scan in $scans) { - # 1. Retrieve the matching Rule to find SIT details - # We use SilentlyContinue because a broken/orphan scan might lack a rule - $rule = Get-SensitiveInformationScanRule -Policy $scan.Name -ErrorAction SilentlyContinue + # Evaluation Logic + if ($null -ne $ScansList -and $ScansList.Count -ge 1) { + $Result = "Pass" + $Message = "At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information." + } + else { + $Result = "Fail" + $Message = "No on-demand scans are configured in the organization; historical sensitive data cannot be discovered." + } - # 2. Extract SIT Names using the discovered property path - $sitNamesList = @() - if ($rule -and $rule.ContentContainsSensitiveInformation -and $rule.ContentContainsSensitiveInformation.groups -and $rule.ContentContainsSensitiveInformation.groups.sensitivetypes) { - $types = $rule.ContentContainsSensitiveInformation.groups.sensitivetypes - foreach ($t in $types) { - if ($t.Name) { $sitNamesList += $t.Name } - } - } + # SIT GUID Mapping + $SitGuidMap = @{ + "50842eb7-edc8-4019-85dd-5a5c1f2bb085" = "Credit Card Number" + "a44669fe-0d48-453d-a9b1-2cc83f2cba77" = "U.S. Social Security Number (SSN)" + "ed36cf51-9d63-40f3-a9a6-5a865c418d21" = "U.S. Bank Account Number" + "48ee9090-3f74-4238-89c9-6c0a93767a8f" = "SWIFT Code" + "50f56e32-3a6f-459f-82e9-e2b27b96b430" = "Drivers License Number (U.S.)" + "65ce4b3d-79b3-46c0-ba9d-8226d98130c8" = "IBAN (International Banking Account Number)" + "3b35900d-fd2d-446b-b3ad-b4723419e2d5" = "ABA Routing Number" + "f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4" = "Canada Social Insurance Number" + "f87b75b6-570d-465d-a91a-f0d9b9e0b000" = "U.K. National Insurance Number (NINO)" + "b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f" = "International Medical Record Number (MRN)" + } - # Fallback if list is empty but rule exists (uncommon, but handles potential unexpected structures) - if ($sitNamesList.Count -eq 0) { - if ($rule) { $sitNamesList += "All/None Specific" } - else { $sitNamesList += "Rule Not Found" } - } + # Process Details + foreach ($ScanSummary in $ScansList) { + # Query 3: Get details for specific scan to ensure we have ItemStatistics + $Scan = Get-SensitiveInformationScan -Identity $ScanSummary.Name -ErrorAction SilentlyContinue + if (-not $Scan) { $Scan = $ScanSummary } - $sitString = $sitNamesList -join ", " + $SitDetails = @() - # 3. Format Other Columns - $scanName = $scan.Name - $status = $scan.SensitiveInformationScanStatus - $workload = if ($scan.Workload) { $scan.Workload -replace ",", ", " } else { "None" } - $lastRun = if ($scan.LastImpactAssessmentStartTime) { $scan.LastImpactAssessmentStartTime.ToString("yyyy-MM-dd") } else { "Never" } + # ItemStatistics parsing + if ($Scan.ItemStatistics -and $Scan.ItemStatistics.SIT) { + $Sits = $Scan.ItemStatistics.SIT - # 4. Append Row to Markdown Table - $testResultMarkdown += "| $scanName | $status | $workload | $lastRun | $sitString |`n" - } - } - #endregion Data Processing & Report Generation + # Handle if it's a PSObject (common in deserialized objects) or Dictionary + $SitKeys = if ($Sits -is [System.Collections.IDictionary]) { $Sits.Keys } elseif ($Sits -is [PSCustomObject]) { $Sits.PSObject.Properties.Name } else { $null } - $testResultDetail = @{ - TestId = '35022' - Title = 'On-Demand Scans Configured for Sensitive Information Discovery' - Status = $passed - Result = $testResultMarkdown - } + if ($SitKeys) { + foreach ($Guid in $SitKeys) { + $Count = if ($Sits -is [System.Collections.IDictionary]) { $Sits[$Guid] } else { $Sits.$Guid } + $FriendlyName = if ($SitGuidMap.ContainsKey($Guid)) { $SitGuidMap[$Guid] } else { "Unknown SIT - $Guid" } + $SitDetails += "$FriendlyName: $Count matches" + } + } + } - if ($customStatus) { - $testResultDetail.CustomStatus = $customStatus - } + $SitString = if ($SitDetails.Count -gt 0) { $SitDetails -join ", " } else { "None" } - Add-ZtTestResultDetail @testResultDetail -} + $Details += [PSCustomObject]@{ + Name = $Scan.Name + SensitiveInformationScanStatus = $Scan.SensitiveInformationScanStatus + Workload = if ($Scan.Workload) { $Scan.Workload -join ", " } else { "" } + "Sensitive Information Types Detected" = $SitString + WhenCreatedUTC = $Scan.WhenCreatedUTC + LastScanStartTime = $Scan.LastScanSt From 4ca3b72334c67b733e439b7bc3f3288c5c3678c0 Mon Sep 17 00:00:00 2001 From: Kshitiz sharma Date: Fri, 30 Jan 2026 19:56:26 +0530 Subject: [PATCH 6/9] 35022-fix --- .../tests/Test-Assessment.35022.ps1 | 232 +++++++++++++----- 1 file changed, 174 insertions(+), 58 deletions(-) diff --git a/src/powershell/tests/Test-Assessment.35022.ps1 b/src/powershell/tests/Test-Assessment.35022.ps1 index d89535fe5..0b7c3542f 100644 --- a/src/powershell/tests/Test-Assessment.35022.ps1 +++ b/src/powershell/tests/Test-Assessment.35022.ps1 @@ -13,81 +13,197 @@ User Impact: Low Implementation Cost: Medium #> + function Test-Assessment35022 { + [ZtTest( + Category = 'Information Protection', + ImplementationCost = 'Medium', + MinimumLicense = 'Microsoft 365 E5', + Pillar = 'Data', + RiskLevel = 'Medium', + SfiPillar = 'Protect tenants and production systems', + TenantType = 'Workforce', + TestId = 35022, + Title = 'On-Demand Scans Configured for Sensitive Information Discovery', + UserImpact = 'Low' + )] [CmdletBinding()] - param( - [Parameter(Mandatory = $false)] - [Object]$Context - ) + param() - $TestId = "35022" - $Result = "Investigate" - $Message = "" - $Details = @() + #region Data Collection + Write-PSFMessage '🟦 Start' -Tag Test -Level VeryVerbose - try { - # Check prerequisites - if (-not (Get-Command Get-SensitiveInformationScan -ErrorAction SilentlyContinue)) { - throw "Command 'Get-SensitiveInformationScan' not found. Ensure ExchangeOnlineManagement module is installed and connected." - } + $activity = 'Checking On-Demand Scan Configuration for Sensitive Information Discovery' + Write-ZtProgress -Activity $activity -Status 'Getting on-demand scans' + + $scansList = $null + $errorMsg = $null + try { # Query 1: Get all on-demand scans - $ScansList = Get-SensitiveInformationScan -ErrorAction Stop + $scansList = Get-SensitiveInformationScan -ErrorAction Stop + } + catch { + $errorMsg = $_ + Write-PSFMessage "Error querying on-demand scans: $_" -Level Error + } + #endregion Data Collection + + #region Assessment Logic + if ($errorMsg) { + $passed = $false + $scanCount = 0 + $tableData = $null + $statusCounts = $null + $hasSharePoint = 0 + $hasOneDrive = 0 + $hasExchange = 0 + $mostRecentScan = $null + } + else { + $scanCount = if ($null -ne $scansList) { @($scansList).Count } else { 0 } + $passed = $scanCount -ge 1 + + if ($scanCount -gt 0) { + # SIT GUID Mapping + $sitGuidMap = @{ + "50842eb7-edc8-4019-85dd-5a5c1f2bb085" = "Credit Card Number" + "a44669fe-0d48-453d-a9b1-2cc83f2cba77" = "U.S. Social Security Number (SSN)" + "ed36cf51-9d63-40f3-a9a6-5a865c418d21" = "U.S. Bank Account Number" + "48ee9090-3f74-4238-89c9-6c0a93767a8f" = "SWIFT Code" + "50f56e32-3a6f-459f-82e9-e2b27b96b430" = "Drivers License Number (U.S.)" + "65ce4b3d-79b3-46c0-ba9d-8226d98130c8" = "IBAN (International Banking Account Number)" + "3b35900d-fd2d-446b-b3ad-b4723419e2d5" = "ABA Routing Number" + "f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4" = "Canada Social Insurance Number" + "f87b75b6-570d-465d-a91a-f0d9b9e0b000" = "U.K. National Insurance Number (NINO)" + "b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f" = "International Medical Record Number (MRN)" + } - # Evaluation Logic - if ($null -ne $ScansList -and $ScansList.Count -ge 1) { - $Result = "Pass" - $Message = "At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information." + # Build table with scan details + $tableData = @() + foreach ($scan in @($scansList)) { + # Get detailed scan info + $scanDetail = Get-SensitiveInformationScan -Identity $scan.Name -ErrorAction SilentlyContinue + if (-not $scanDetail) { $scanDetail = $scan } + + # Parse SIT details + $sitDetails = @() + if ($scanDetail.ItemStatistics -and $scanDetail.ItemStatistics.SIT) { + $sits = $scanDetail.ItemStatistics.SIT + $sitKeys = if ($sits -is [System.Collections.IDictionary]) { $sits.Keys } elseif ($sits -is [PSCustomObject]) { $sits.PSObject.Properties.Name } else { $null } + + if ($sitKeys) { + foreach ($guid in $sitKeys) { + $count = if ($sits -is [System.Collections.IDictionary]) { $sits[$guid] } else { $sits.$guid } + $friendlyName = if ($sitGuidMap.ContainsKey($guid)) { $sitGuidMap[$guid] } else { "Unknown SIT - $guid" } + $sitDetails += "$friendlyName`: $count matches" + } + } + } + + $sitString = if ($sitDetails.Count -gt 0) { $sitDetails -join ", " } else { "None" } + $workload = if ($scanDetail.Workload) { $scanDetail.Workload -join ", " } else { "" } + $lastScanTime = if ($scanDetail.LastScanStartTime) { $scanDetail.LastScanStartTime } else { "" } + + $tableData += [PSCustomObject]@{ + Name = $scanDetail.Name + Status = $scanDetail.SensitiveInformationScanStatus + Workload = $workload + 'SIT Detected' = $sitString + 'Created (UTC)' = $scanDetail.WhenCreatedUTC + 'Last Scan Start' = $lastScanTime + } + } + + # Count scans by status + $statusCounts = @{} + @($scansList) | ForEach-Object { + $status = $_.SensitiveInformationScanStatus + if ($statusCounts.ContainsKey($status)) { + $statusCounts[$status]++ + } else { + $statusCounts[$status] = 1 + } + } + + # Check workload coverage + $hasSharePoint = @($scansList) | Where-Object { $_.Workload -contains "SharePoint" } | Measure-Object | Select-Object -ExpandProperty Count + $hasOneDrive = @($scansList) | Where-Object { $_.Workload -contains "OneDrive" } | Measure-Object | Select-Object -ExpandProperty Count + $hasExchange = @($scansList) | Where-Object { $_.Workload -contains "Exchange" } | Measure-Object | Select-Object -ExpandProperty Count + + # Get most recent scan time + $mostRecentScan = @($scansList) | + Where-Object { $_.LastScanStartTime } | + Sort-Object LastScanStartTime -Descending | + Select-Object -First 1 | + Select-Object -ExpandProperty LastScanStartTime } else { - $Result = "Fail" - $Message = "No on-demand scans are configured in the organization; historical sensitive data cannot be discovered." + $tableData = $null + $statusCounts = $null + $hasSharePoint = 0 + $hasOneDrive = 0 + $hasExchange = 0 + $mostRecentScan = $null } + } + #endregion Assessment Logic - # SIT GUID Mapping - $SitGuidMap = @{ - "50842eb7-edc8-4019-85dd-5a5c1f2bb085" = "Credit Card Number" - "a44669fe-0d48-453d-a9b1-2cc83f2cba77" = "U.S. Social Security Number (SSN)" - "ed36cf51-9d63-40f3-a9a6-5a865c418d21" = "U.S. Bank Account Number" - "48ee9090-3f74-4238-89c9-6c0a93767a8f" = "SWIFT Code" - "50f56e32-3a6f-459f-82e9-e2b27b96b430" = "Drivers License Number (U.S.)" - "65ce4b3d-79b3-46c0-ba9d-8226d98130c8" = "IBAN (International Banking Account Number)" - "3b35900d-fd2d-446b-b3ad-b4723419e2d5" = "ABA Routing Number" - "f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4" = "Canada Social Insurance Number" - "f87b75b6-570d-465d-a91a-f0d9b9e0b000" = "U.K. National Insurance Number (NINO)" - "b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f" = "International Medical Record Number (MRN)" + #region Report Generation + if ($errorMsg) { + $testResultMarkdown = "### Investigate`n`n" + $testResultMarkdown += "Unable to retrieve on-demand scan configuration due to error: $errorMsg`n`n" + $testResultMarkdown += "Ensure you have the required permissions (Compliance Administrator, Compliance Data Administrator, or Security Administrator) and that Security & Compliance Center PowerShell is connected via `Connect-IPPSSession`." + } + else { + if ($passed) { + $testResultMarkdown = "✅ At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information.`n`n" + } + else { + $testResultMarkdown = "❌ No on-demand scans are configured in the organization; historical sensitive data cannot be discovered.`n`n" } - # Process Details - foreach ($ScanSummary in $ScansList) { - # Query 3: Get details for specific scan to ensure we have ItemStatistics - $Scan = Get-SensitiveInformationScan -Identity $ScanSummary.Name -ErrorAction SilentlyContinue - if (-not $Scan) { $Scan = $ScanSummary } + $testResultMarkdown += "### On-Demand Scan Configuration Summary`n`n" - $SitDetails = @() + if ($scanCount -gt 0 -and $tableData) { + # Convert table to markdown + $testResultMarkdown += "**Scan Details:**`n`n" + $testResultMarkdown += "| Name | Status | Workload | SIT Detected | Created (UTC) | Last Scan Start |`n" + $testResultMarkdown += "|------|--------|----------|--------------|---------------|-----------------|`n" - # ItemStatistics parsing - if ($Scan.ItemStatistics -and $Scan.ItemStatistics.SIT) { - $Sits = $Scan.ItemStatistics.SIT + foreach ($row in $tableData) { + $testResultMarkdown += "| $($row.Name) | $($row.Status) | $($row.Workload) | $($row.'SIT Detected') | $($row.'Created (UTC)') | $($row.'Last Scan Start') |`n" + } - # Handle if it's a PSObject (common in deserialized objects) or Dictionary - $SitKeys = if ($Sits -is [System.Collections.IDictionary]) { $Sits.Keys } elseif ($Sits -is [PSCustomObject]) { $Sits.PSObject.Properties.Name } else { $null } + $testResultMarkdown += "`n" - if ($SitKeys) { - foreach ($Guid in $SitKeys) { - $Count = if ($Sits -is [System.Collections.IDictionary]) { $Sits[$Guid] } else { $Sits.$Guid } - $FriendlyName = if ($SitGuidMap.ContainsKey($Guid)) { $SitGuidMap[$Guid] } else { "Unknown SIT - $Guid" } - $SitDetails += "$FriendlyName: $Count matches" - } - } + # Build summary statistics + $testResultMarkdown += "**Summary Statistics:**`n`n" + $testResultMarkdown += "* **Total On-Demand Scans Configured:** $scanCount`n" + $testResultMarkdown += "* **Scans by Status:**`n" + foreach ($status in ($statusCounts.Keys | Sort-Object)) { + $testResultMarkdown += " * $status`: $($statusCounts[$status])`n" } + $testResultMarkdown += "* **Locations Scanned:**`n" + $testResultMarkdown += " * SharePoint: $(if ($hasSharePoint -gt 0) { 'Yes' } else { 'No' })`n" + $testResultMarkdown += " * OneDrive: $(if ($hasOneDrive -gt 0) { 'Yes' } else { 'No' })`n" + $testResultMarkdown += " * Exchange: $(if ($hasExchange -gt 0) { 'Yes' } else { 'No' })`n" + $testResultMarkdown += "* **Most Recent Scan Completion:** $(if ($mostRecentScan) { $mostRecentScan } else { 'No completed scans' })`n" + } + else { + $testResultMarkdown += "* **Total On-Demand Scans Configured:** 0`n" + $testResultMarkdown += "* **Status:** No scans are configured`n" + } - $SitString = if ($SitDetails.Count -gt 0) { $SitDetails -join ", " } else { "None" } + $testResultMarkdown += "`n[Manage On-Demand Scans in Microsoft Purview Portal](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans)`n" + } + #endregion Report Generation - $Details += [PSCustomObject]@{ - Name = $Scan.Name - SensitiveInformationScanStatus = $Scan.SensitiveInformationScanStatus - Workload = if ($Scan.Workload) { $Scan.Workload -join ", " } else { "" } - "Sensitive Information Types Detected" = $SitString - WhenCreatedUTC = $Scan.WhenCreatedUTC - LastScanStartTime = $Scan.LastScanSt + $params = @{ + TestId = '35022' + Title = 'On-Demand Scans Configured for Sensitive Information Discovery' + Status = $passed + Result = $testResultMarkdown + } + Add-ZtTestResultDetail @params +} From 22bedca5fd3972d733fd9f6d58d79a3930c01e37 Mon Sep 17 00:00:00 2001 From: Kshitiz sharma Date: Wed, 4 Feb 2026 20:29:50 +0530 Subject: [PATCH 7/9] Finetuned --- src/powershell/tests/35022.md | 289 ++++++++++++++++ .../tests/Test-Assessment.35022.ps1 | 321 ++++++++++++------ 2 files changed, 509 insertions(+), 101 deletions(-) create mode 100644 src/powershell/tests/35022.md diff --git a/src/powershell/tests/35022.md b/src/powershell/tests/35022.md new file mode 100644 index 000000000..04d6269d9 --- /dev/null +++ b/src/powershell/tests/35022.md @@ -0,0 +1,289 @@ +--- +author.spec: tygrady +author.doc: +author.dev: kshitiz-prog +--- + +# On-Demand Scans Configured for Sensitive Information Discovery + +## Spec Status + +Completed + +## Documentation Status + +Not started + +## Dev Status + +In progress + +## Minimum License + +Microsoft 365 E5 + +## Supported Clouds + +Global + +## Pillar + +Data + +## SFI Pillar + +Protect tenants and production systems + +## Category + +Information Protection + +## Risk Level + +Medium + +## User Impact + +Low + +## Implementation Cost + +Medium + +## Customer Facing Explanation + +Organizations with large volumes of historical content in SharePoint, OneDrive, and Exchange that predates auto-labeling policy implementation lack visibility into the extent of unclassified sensitive data across their tenants. Auto-labeling policies only classify new and modified content going forward; existing files and emails remain unclassified and invisible to data loss prevention policies that depend on label detection. Without on-demand scans, organizations cannot perform a baseline assessment of sensitive information already present in their environments, making it impossible to quantify compliance risk, plan remediation, or validate that DLP controls are effectively protecting all sensitive data. On-demand scans allow organizations to manually trigger sensitive information type detection across specified SharePoint sites, OneDrive accounts, and Exchange mailboxes, identifying where sensitive data exists and enabling targeted classification through retroactive labeling. Configuring at least one on-demand scan enables organizations to discover and classify historical sensitive data, providing a comprehensive view of their information protection posture beyond the forward-looking coverage of auto-labeling policies and creating a complete baseline for compliance and risk management. + +## Query Prerequisites + +**Required PowerShell Modules:** +- ExchangeOnlineManagement v3.5.1+ + +**Required Permissions:** +- Compliance Administrator, Compliance Data Administrator, or Security Administrator role +- Organization Management role (Exchange Online) + +**Connection Requirements:** +- Connection to Security & Compliance Center PowerShell: `Connect-IPPSSession` +- Requires internet connectivity to Microsoft 365 services + +**Notes:** +- On-demand scans are distinct from auto-labeling policies; they are manual discovery operations that scan historical content +- Scans can target SharePoint sites, OneDrive accounts, and Exchange mailboxes +- Multiple scans can be configured, each with different scope and sensitive information type targets +- Scans can be run on a scheduled basis (recurring) or as one-time operations +- Scan results are available in the Microsoft Purview Portal and can be used to inform labeling and remediation strategies +- Scans do not automatically apply labels; they identify where sensitive data exists for manual or policy-based remediation +- Large tenant scans can be resource-intensive and may take days or weeks to complete depending on data volume + +## Check Query + +* Query 1: Q1: Get all on-demand scans configured +`Get-SensitiveInformationScan` + +Documentation: [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) + +The cmdlet returns all sensitive information scans configured in the tenant, including those that are scheduled, in progress, completed, or waiting to run. This provides a complete view of scan configurations and their status. + +--- + +## Technical Note: Dynamic SIT Name Translation + +When `ItemStatistics` returns SIT GUIDs, the assessment tool must translate these GUIDs to friendly names for user-facing output. Rather than maintaining a static hardcoded mapping, this spec uses **dynamic discovery** via `Get-DlpSensitiveInformationType`: + +```powershell +# Build SIT GUID-to-name mapping dynamically +$sitCatalog = Get-DlpSensitiveInformationType +$sitGuidMap = @{} +foreach ($sit in $sitCatalog) { + $sitGuidMap[$sit.Identity.ToString()] = $sit.Name +} + +# Translate GUIDs from ItemStatistics to friendly names +foreach ($guid in $itemStatistics.SIT.PSObject.Properties.Name) { + $sitName = $sitGuidMap[$guid] + $matchCount = $itemStatistics.SIT.$guid + Write-Output "$sitName: $matchCount matches" +} +``` + +**Benefits:** +- **Zero Maintenance:** Automatically reflects new SITs when Microsoft releases them +- **Resilient:** Uses official SIT catalog as source of truth +- **Accurate:** Includes custom SITs created in the tenant, not just Microsoft-provided defaults +- **Complete:** Handles all SIT types without manual list updates + +This approach eliminates the maintenance burden of updating a hardcoded GUID map whenever Microsoft adds new SITs to the catalog. + +* Query 2: Q2: Examine on-demand scan configuration details +For each scan returned from Q1, examine the following properties: + - `Name` - Display name of the scan + - `SensitiveInformationScanStatus` - Current status (NotStarted, ImpactAssessmentInProgress, InProgress, Completed, CompletedWithErrors, Paused, Failed) + - `Workload` - Which workloads/locations the scan targets (SharePoint, OneDrive, Exchange) + - `WhenCreatedUTC` - When the scan was created + - `LastScanStartTime` - When the scan last executed (if applicable) + +If at least one scan exists in the tenant, on-demand scanning is configured. Note: Scan status may vary; a configured scan can be in any status (not yet run, in progress, completed, or failed). + +* Query 3: Q3: Get details on specific on-demand scan +`Get-SensitiveInformationScan -Identity "" | Select-Object -Property Name, SensitiveInformationScanStatus, Workload, ItemStatistics, LastImpactAssessmentStartTime, LastScanStartTime` + +Documentation: [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) + +This query returns detailed information about a specific on-demand scan including its current status, target locations, and sensitive information types matched. Review `SensitiveInformationScanStatus` to understand the scan phase (ImpactAssessmentComplete = estimation done, InProgress = scan running, Completed = finished), `ItemStatistics` to see which SIT types detected content and match counts, and `LastScanStartTime` to confirm when enforcement scanning last performed. + +--- + +## SIT GUID Reference + +**ItemStatistics Property Format:** + +The `ItemStatistics` property returns a JSON object containing SIT GUIDs (not friendly names) with match counts: + +```json +{ + "SIT": { + "50842eb7-edc8-4019-85dd-5a5c1f2bb085": 2, + "a44669fe-0d48-453d-a9b1-2cc83f2cba77": 1, + "d3515d47-d117-4910-bc05-824519861cf2": 5 + } +} +``` + +**Parsing & Translation:** + +The assessment tool **must** translate these GUIDs to friendly SIT names for user-facing output. Use the following reference: + +**Common SIT GUIDs to Friendly Names (Core Financial & PII):** + +| GUID | Friendly Name | +|------|---------------| +| `50842eb7-edc8-4019-85dd-5a5c1f2bb085` | Credit Card Number | +| `a44669fe-0d48-453d-a9b1-2cc83f2cba77` | U.S. Social Security Number (SSN) | +| `ed36cf51-9d63-40f3-a9a6-5a865c418d21` | U.S. Bank Account Number | +| `48ee9090-3f74-4238-89c9-6c0a93767a8f` | SWIFT Code | +| `50f56e32-3a6f-459f-82e9-e2b27b96b430` | Drivers License Number (U.S.) | +| `65ce4b3d-79b3-46c0-ba9d-8226d98130c8` | IBAN (International Banking Account Number) | +| `3b35900d-fd2d-446b-b3ad-b4723419e2d5` | ABA Routing Number | +| `f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4` | Canada Social Insurance Number | +| `f87b75b6-570d-465d-a91a-f0d9b9e0b000` | U.K. National Insurance Number (NINO) | +| `b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f` | International Medical Record Number (MRN) | + +**Full SIT GUID Reference (For Documentation Only):** + +The table above provides a reference for common SIT GUIDs. **In implementation, use the dynamic catalog lookup shown in the Technical Note above** rather than maintaining a static mapping. + +For research or documentation purposes, Microsoft publishes all available SIT definitions: +- **Source:** [Sensitive Information Type Entity Definitions](https://learn.microsoft.com/en-us/purview/sit-sensitive-information-type-entity-definitions) +- **Format:** Each SIT definition page contains the Entity ID in XML (the GUID you need) +- **Example:** [Credit Card Number SIT Definition](https://learn.microsoft.com/en-us/purview/sit-defn-credit-card-number) shows Entity ID: `50842eb7-edc8-4019-85dd-5a5c1f2bb085` + +**Implementation Guidance:** + +1. Use `Get-DlpSensitiveInformationType` to build dynamic GUID-to-name catalog (see Technical Note) +2. Extract GUIDs from `ItemStatistics.SIT` object +3. Look up each GUID in the dynamic catalog built in step 1 +4. Replace GUID with friendly name in user-facing output +5. Display as: "[Friendly Name]: [Count] matches" (e.g., "Credit Card Number: 2 matches") +6. If a GUID cannot be mapped, display as "[Unknown SIT - GUID]: [Count]" to avoid silent failures +7. Note: Custom SITs created in the tenant will be included automatically in the dynamic catalog + +**Notes:** +- ItemStatistics is updated in real-time as the scan progresses +- GUIDs are consistent across all Microsoft 365 tenants and never change +- Custom sensitive information types have unique GUIDs; the dynamic catalog includes them automatically + +--- + +## User facing message + +Pass: At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information. +Fail: No on-demand scans are configured in the organization; historical sensitive data cannot be discovered. +Investigate: Unable to determine on-demand scan configuration due to permissions issues or query failure. + +## Test evaluation logic + +1. **Build SIT Catalog:** Execute `Get-DlpSensitiveInformationType` and build dynamic GUID-to-name mapping (see Technical Note above) +2. **Query Q1:** Execute `Get-SensitiveInformationScan` to get all on-demand scans +3. **Count Scans:** Count the number of scans returned +4. **Evaluate Pass/Fail:** + - If count ≥ 1, the test passes (at least one scan is configured) + - If count = 0, the test fails (no scans configured) + - If the query fails or cannot be executed due to permissions, mark as Investigate +5. **Translate GUIDs:** For each scan's `ItemStatistics`, translate SIT GUIDs to friendly names using the dynamic catalog built in step 1 +6. **Note:** The test passes regardless of scan status (NotStarted, InProgress, Completed, Failed) as long as a scan is configured + +## Test output data + +The test will output on-demand scan configuration statistics: + +**Exact Output Table Format:** +``` +Name | SensitiveInformationScanStatus | Workload | Sensitive Information Types Detected | WhenCreatedUTC | LastScanStartTime +------------------------------|--------------------------------|------------------|----------------------------------------|---------------------|------------------ +FinancialDocumentScan | ImpactAssessmentComplete | SharePoint | Credit Card Number: 2 matches | 2024-01-15 10:30:00 | (empty) +HistoricalEmailScan | InProgress | Exchange | U.S. SSN: 1 match | 2024-01-10 08:15:00 | 2024-01-24 09:00:00 +``` + +**Summary:** +* Total On-Demand Scans Configured: [count] +* Scans by Status: + - Completed: [count] + - In Progress: [count] + - Not Started: [count] + - Failed: [count] + - Other: [count] +* Locations Scanned: + - SharePoint: [Yes/No] + - OneDrive: [Yes/No] + - Exchange: [Yes/No] +* Sensitive Information Types Covered: [List from scan configurations] +* Most Recent Scan Completion: [date] +* Status: Pass/Fail/Investigate + +Link to portal: [Microsoft Purview Portal > Information Protection > Classifiers > On-demand classification](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans) or [Microsoft Purview Portal > Data Loss Prevention > Classifiers > On-demand classification](https://purview.microsoft.com/datalossprevention/dataclassification/colddatascans) + +--- + +## Check Results + +**Result Summary:** +- **Pass:** At least one on-demand scan is configured for sensitive information discovery. +- **Fail:** No on-demand scans are configured; historical sensitive data cannot be discovered or classified. + +**Expected Details:** +- Total on-demand scans configured: [count from Q1] +- Scan names: [list from Q1 Name property] +- Scan status breakdown: Count by SensitiveInformationScanStatus (ImpactAssessmentComplete, InProgress, NotStarted, Failed, other) +- Location coverage: Review Workload property - SharePoint [Yes/No], OneDrive [Yes/No], Exchange [Yes/No] +- Sensitive information types covered: [Friendly names with match counts, translated from ItemStatistics property GUIDs using dynamic catalog lookup] +- Most recent scan/assessment: [Max/latest date from LastImpactAssessmentStartTime or LastScanStartTime property] +- Historical data discovery: [Enabled if count ≥ 1, Disabled if count = 0] +- SIT Catalog Source: Results reflect current `Get-DlpSensitiveInformationType` catalog (includes Microsoft-provided and custom SITs) + +**Portal Link:** [On-demand scans in Microsoft Purview portal](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans) + +**Remediation Steps:** +1. Navigate to Information Protection > Classifiers > On-demand classification in Microsoft Purview +2. Select "Create scan" to begin configuration +3. Choose target locations (SharePoint, OneDrive, Exchange) +4. Select sensitive information types to detect +5. Review Impact Assessment for estimated scope +6. Schedule or start the scan +7. Monitor progress and review results + +**Learn More:** [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) + +--- + +## Challenges + +- **Scan Duration and Resource Impact:** Comprehensive baseline scans on large deployments (millions of files in SharePoint, OneDrive, and Exchange) can take weeks or months to complete. Scans consume significant tenant resources (compute, storage I/O, network bandwidth) and may impact user productivity. Organizations must carefully plan scan timing and scope, often starting with high-priority sites or accounts rather than comprehensive scans, which limits initial visibility. + +- **Detection Accuracy and False Positives:** Sensitive information type detection depends on confidence level tuning—low-confidence settings generate excessive false positives (e.g., sequences matching credit card patterns but not actual numbers), while high-confidence settings may miss real sensitive data. Trainable classifiers add complexity, requiring high-quality seed data and iterative training to achieve accuracy. File type variability (documents, spreadsheets, PDFs, images, encrypted files) affects detection performance, and some SITs perform better in specific formats. + +- **Results Triage and Remediation:** Comprehensive scans of large environments generate thousands or millions of findings, creating an overwhelming data management challenge. Organizations need robust processes to prioritize findings by sensitivity level and location. Beyond triage, clear remediation policies are essential—without defined approaches (retroactive labeling, DLP policy enforcement, access restrictions), scan results may not be acted upon, leaving discovered sensitive data unprotected. + +- **Operational Constraints:** Organizations with litigation holds or strict data retention policies face constraints on scanning and retroactively labeling historical content, requiring legal and compliance review. Multi-tenant environments and federated identities add complexity—coordinating scans across tenants becomes operationally difficult, and fragmented discovery results reduce visibility. + +- **Scan Schedule Optimization:** Determining optimal scan frequency requires balancing discovery completeness with resource impact and organizational change patterns. Too-frequent scans risk performance issues; too-infrequent scans may miss new sensitive data. Finding the right cadence depends on data growth rates and risk tolerance. diff --git a/src/powershell/tests/Test-Assessment.35022.ps1 b/src/powershell/tests/Test-Assessment.35022.ps1 index 0b7c3542f..9d72a1fe4 100644 --- a/src/powershell/tests/Test-Assessment.35022.ps1 +++ b/src/powershell/tests/Test-Assessment.35022.ps1 @@ -3,8 +3,11 @@ On-Demand Scans Configured for Sensitive Information Discovery .DESCRIPTION - Checks if on-demand scans are configured for sensitive information discovery in SharePoint, OneDrive, and Exchange. - Ref: https://learn.microsoft.com/en-us/purview/on-demand-classification + Checks if on-demand scans are configured for sensitive information discovery in + SharePoint, OneDrive, and Exchange. Implements dynamic SIT GUID -> friendly name + resolution and generates a markdown result suitable for inclusion in test reports. + + Reference: https://learn.microsoft.com/en-us/purview/on-demand-classification .NOTES Test ID: 35022 @@ -14,7 +17,7 @@ Implementation Cost: Medium #> -function Test-Assessment35022 { +function Test-Assessment-35022 { [ZtTest( Category = 'Information Protection', ImplementationCost = 'Medium', @@ -33,14 +36,49 @@ function Test-Assessment35022 { #region Data Collection Write-PSFMessage '🟦 Start' -Tag Test -Level VeryVerbose - $activity = 'Checking On-Demand Scan Configuration for Sensitive Information Discovery' - Write-ZtProgress -Activity $activity -Status 'Getting on-demand scans' + $activity = 'Checking On-Demand Scans Configured for Sensitive Information Discovery' + Write-ZtProgress -Activity $activity -Status 'Getting SIT Catalog' + $sitGuidMap = @{} $scansList = $null $errorMsg = $null try { - # Query 1: Get all on-demand scans + # Build dynamic SIT catalog from tenant + $sitCatalog = Get-DlpSensitiveInformationType -ErrorAction Stop + foreach ($sit in $sitCatalog) { + try { + $id = $null + $id = $sit.Identity + $name = $sit.Name + $sitGuidMap[$id] = $name + } + catch { + # Ignore individual SIT failures, continue + } + } + } + catch { + Write-PSFMessage "Warning: Failed to build SIT catalog from tenant: $($_.Exception.Message)" -Level Warning + } + + # Fallback common SIT mapping + $fallbackMap = @{ + '50842eb7-edc8-4019-85dd-5a5c1f2bb085' = 'Credit Card Number' + 'a44669fe-0d48-453d-a9b1-2cc83f2cba77' = 'U.S. Social Security Number (SSN)' + 'ed36cf51-9d63-40f3-a9a6-5a865c418d21' = 'U.S. Bank Account Number' + '48ee9090-3f74-4238-89c9-6c0a93767a8f' = 'SWIFT Code' + '50f56e32-3a6f-459f-82e9-e2b27b96b430' = 'Drivers License Number (U.S.)' + '65ce4b3d-79b3-46c0-ba9d-8226d98130c8' = 'IBAN (International Banking Account Number)' + '3b35900d-fd2d-446b-b3ad-b4723419e2d5' = 'ABA Routing Number' + 'f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4' = 'Canada Social Insurance Number' + 'f87b75b6-570d-465d-a91a-f0d9b9e0b000' = 'U.K. National Insurance Number (NINO)' + 'b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f' = 'International Medical Record Number (MRN)' + } + + Write-ZtProgress -Activity $activity -Status 'Getting On-Demand Scans' + + try { $scansList = Get-SensitiveInformationScan -ErrorAction Stop } catch { @@ -50,112 +88,174 @@ function Test-Assessment35022 { #endregion Data Collection #region Assessment Logic + $scanCount = 0 + $passed = $false + $tableData = @() + $statusCounts = @{} + $hasSharePoint = 0 + $hasOneDrive = 0 + $hasExchange = 0 + $customStatus = $null + $mostRecentScan = $null + if ($errorMsg) { $passed = $false - $scanCount = 0 - $tableData = $null - $statusCounts = $null - $hasSharePoint = 0 - $hasOneDrive = 0 - $hasExchange = 0 - $mostRecentScan = $null } else { - $scanCount = if ($null -ne $scansList) { @($scansList).Count } else { 0 } + $scanCount = @($scansList).Count $passed = $scanCount -ge 1 if ($scanCount -gt 0) { - # SIT GUID Mapping - $sitGuidMap = @{ - "50842eb7-edc8-4019-85dd-5a5c1f2bb085" = "Credit Card Number" - "a44669fe-0d48-453d-a9b1-2cc83f2cba77" = "U.S. Social Security Number (SSN)" - "ed36cf51-9d63-40f3-a9a6-5a865c418d21" = "U.S. Bank Account Number" - "48ee9090-3f74-4238-89c9-6c0a93767a8f" = "SWIFT Code" - "50f56e32-3a6f-459f-82e9-e2b27b96b430" = "Drivers License Number (U.S.)" - "65ce4b3d-79b3-46c0-ba9d-8226d98130c8" = "IBAN (International Banking Account Number)" - "3b35900d-fd2d-446b-b3ad-b4723419e2d5" = "ABA Routing Number" - "f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4" = "Canada Social Insurance Number" - "f87b75b6-570d-465d-a91a-f0d9b9e0b000" = "U.K. National Insurance Number (NINO)" - "b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f" = "International Medical Record Number (MRN)" - } + foreach ($scan in $scansList) { + # Use scan object directly - already contains full details from Get-SensitiveInformationScan + # Normalize fields + $name = $scan.Name + $status = $scan.SensitiveInformationScanStatus - # Build table with scan details - $tableData = @() - foreach ($scan in @($scansList)) { - # Get detailed scan info - $scanDetail = Get-SensitiveInformationScan -Identity $scan.Name -ErrorAction SilentlyContinue - if (-not $scanDetail) { $scanDetail = $scan } + # Workload may be string or array + $workload = '' + if ($scan.Workload -is [System.Collections.IEnumerable] -and -not ($scan.Workload -is [string])) { + $workload = ($scan.Workload -join ', ') + } + else { + $workload = $scan.Workload + } - # Parse SIT details + # Parse ItemStatistics.SIT $sitDetails = @() - if ($scanDetail.ItemStatistics -and $scanDetail.ItemStatistics.SIT) { - $sits = $scanDetail.ItemStatistics.SIT - $sitKeys = if ($sits -is [System.Collections.IDictionary]) { $sits.Keys } elseif ($sits -is [PSCustomObject]) { $sits.PSObject.Properties.Name } else { $null } + try { + if ($scan.ItemStatistics -and $scan.ItemStatistics.SIT) { + $sits = $scan.ItemStatistics.SIT + + # Determine SIT keys depending on object type + if ($sits -is [System.Collections.IDictionary]) { + $sitKeys = $sits.Keys + } + elseif ($sits -is [PSCustomObject]) { + $sitKeys = $sits.PSObject.Properties | ForEach-Object { $_.Name } + } + else { + $sitKeys = @() + } - if ($sitKeys) { foreach ($guid in $sitKeys) { - $count = if ($sits -is [System.Collections.IDictionary]) { $sits[$guid] } else { $sits.$guid } - $friendlyName = if ($sitGuidMap.ContainsKey($guid)) { $sitGuidMap[$guid] } else { "Unknown SIT - $guid" } + $guidString = $guid.ToString().Trim() + + # Obtain count for this GUID + $count = 0 + if ($sits -is [System.Collections.IDictionary]) { + $count = $sits[$guid] + } + else { + try { + $count = $sits.$guid + } + catch { + $count = 0 + } + } + + # Resolve SIT GUID to friendly name + $friendlyName = $null + if ($sitGuidMap.ContainsKey($guidString)) { + $friendlyName = $sitGuidMap[$guidString] + } + elseif ($fallbackMap.ContainsKey($guidString)) { + $friendlyName = $fallbackMap[$guidString] + } + else { + # Attempt to query tenant by Identity as a last resort + try { + $sitObj = Get-DlpSensitiveInformationType -Identity $guidString -ErrorAction SilentlyContinue + if ($sitObj) { + if ($sitObj.PSObject.Properties['Name']) { + $friendlyName = $sitObj.Name + } + elseif ($sitObj.PSObject.Properties['DisplayName']) { + $friendlyName = $sitObj.DisplayName + } + else { + $friendlyName = $sitObj.ToString() + } + } + } + catch { + $friendlyName = "Unknown SIT - $guidString" + } + } + + if (-not $friendlyName) { + $friendlyName = "Unknown SIT - $guidString" + } + $sitDetails += "$friendlyName`: $count matches" } } } + catch { + $sitDetails += "Unable to parse ItemStatistics: $($_.Exception.Message)" + } - $sitString = if ($sitDetails.Count -gt 0) { $sitDetails -join ", " } else { "None" } - $workload = if ($scanDetail.Workload) { $scanDetail.Workload -join ", " } else { "" } - $lastScanTime = if ($scanDetail.LastScanStartTime) { $scanDetail.LastScanStartTime } else { "" } - - $tableData += [PSCustomObject]@{ - Name = $scanDetail.Name - Status = $scanDetail.SensitiveInformationScanStatus - Workload = $workload - 'SIT Detected' = $sitString - 'Created (UTC)' = $scanDetail.WhenCreatedUTC - 'Last Scan Start' = $lastScanTime + $sitString = if ($sitDetails.Count -gt 0) { + $sitDetails -join "; " + } + else { + 'None' } - } - # Count scans by status - $statusCounts = @{} - @($scansList) | ForEach-Object { - $status = $_.SensitiveInformationScanStatus - if ($statusCounts.ContainsKey($status)) { - $statusCounts[$status]++ - } else { - $statusCounts[$status] = 1 + $createdUtc = '' + if ($scan.WhenCreatedUTC) { + $createdUtc = $scan.WhenCreatedUTC + } + $lastScanStart = '' + if ($scan.LastScanStartTime) { + $lastScanStart = $scan.LastScanStartTime } - } - # Check workload coverage - $hasSharePoint = @($scansList) | Where-Object { $_.Workload -contains "SharePoint" } | Measure-Object | Select-Object -ExpandProperty Count - $hasOneDrive = @($scansList) | Where-Object { $_.Workload -contains "OneDrive" } | Measure-Object | Select-Object -ExpandProperty Count - $hasExchange = @($scansList) | Where-Object { $_.Workload -contains "Exchange" } | Measure-Object | Select-Object -ExpandProperty Count - - # Get most recent scan time - $mostRecentScan = @($scansList) | - Where-Object { $_.LastScanStartTime } | - Sort-Object LastScanStartTime -Descending | - Select-Object -First 1 | - Select-Object -ExpandProperty LastScanStartTime - } - else { - $tableData = $null - $statusCounts = $null - $hasSharePoint = 0 - $hasOneDrive = 0 - $hasExchange = 0 - $mostRecentScan = $null + # Build output row + $row = [PSCustomObject]@{ + Name = $name + Status = $status + Workload = $workload + 'SIT Detected' = $sitString + 'Created (UTC)' = $createdUtc + 'Last Scan Start' = $lastScanStart + } + $tableData += $row + + # Status counts + if ($status -ne '') { + if ($statusCounts.ContainsKey($status)) { + $statusCounts[$status]++ + } + else { + $statusCounts[$status] = 1 + } + } + } } + + # Workload coverage counts + $hasSharePoint = (@($scansList) | Where-Object { $_.Workload -and (($_.Workload -contains 'SharePoint') -or (($_.Workload -join ',') -match 'SharePoint')) }).Count + $hasOneDrive = (@($scansList) | Where-Object { $_.Workload -and (($_.Workload -contains 'OneDrive') -or (($_.Workload -join ',') -match 'OneDrive')) }).Count + $hasExchange = (@($scansList) | Where-Object { $_.Workload -and (($_.Workload -contains 'Exchange') -or (($_.Workload -join ',') -match 'Exchange')) }).Count + + # Most recent scan start + $mostRecentScan = @($scansList) | Where-Object { $_.LastScanStartTime } | Sort-Object LastScanStartTime -Descending | Select-Object -First 1 | ForEach-Object { $_.LastScanStartTime } } + + #endregion Assessment Logic #region Report Generation + $testResultMarkdown = "" + if ($errorMsg) { - $testResultMarkdown = "### Investigate`n`n" - $testResultMarkdown += "Unable to retrieve on-demand scan configuration due to error: $errorMsg`n`n" - $testResultMarkdown += "Ensure you have the required permissions (Compliance Administrator, Compliance Data Administrator, or Security Administrator) and that Security & Compliance Center PowerShell is connected via `Connect-IPPSSession`." + $testResultMarkdown = "Unable to determine on-demand scan configuration due to permissions issues or query failure.`n`n" + $customStatus = 'Investigate' } else { + $passed =$false if ($passed) { $testResultMarkdown = "✅ At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information.`n`n" } @@ -163,47 +263,66 @@ function Test-Assessment35022 { $testResultMarkdown = "❌ No on-demand scans are configured in the organization; historical sensitive data cannot be discovered.`n`n" } - $testResultMarkdown += "### On-Demand Scan Configuration Summary`n`n" + $testResultMarkdown += "### On-Demand scan configuration summary`n`n" if ($scanCount -gt 0 -and $tableData) { - # Convert table to markdown $testResultMarkdown += "**Scan Details:**`n`n" - $testResultMarkdown += "| Name | Status | Workload | SIT Detected | Created (UTC) | Last Scan Start |`n" + $testResultMarkdown += "| Name | Sensitive information scan status | Workload | Sensitive information types detected | When created UTC | Last scan start time|`n" $testResultMarkdown += "|------|--------|----------|--------------|---------------|-----------------|`n" foreach ($row in $tableData) { - $testResultMarkdown += "| $($row.Name) | $($row.Status) | $($row.Workload) | $($row.'SIT Detected') | $($row.'Created (UTC)') | $($row.'Last Scan Start') |`n" - } + $nameEsc = $row.Name + $statusEsc = $row.Status + $workEsc = $row.Workload + $sitEsc = 'SIT Detected' + $created = if ($row.'Created (UTC)') { + $row.'Created (UTC)' + } + else { + '' + } + $last = if ($row.'Last Scan Start') { + $row.'Last Scan Start' + } + else { + '' + } - $testResultMarkdown += "`n" + $testResultMarkdown += "| $nameEsc | $statusEsc | $workEsc | $sitEsc | $created | $last |`n" + } - # Build summary statistics - $testResultMarkdown += "**Summary Statistics:**`n`n" - $testResultMarkdown += "* **Total On-Demand Scans Configured:** $scanCount`n" - $testResultMarkdown += "* **Scans by Status:**`n" + $testResultMarkdown += "`n**Summary:**`n`n" + $testResultMarkdown += "* **Total on-demand scans configured:** $scanCount`n" + $testResultMarkdown += "* **Scans by status:**`n" foreach ($status in ($statusCounts.Keys | Sort-Object)) { $testResultMarkdown += " * $status`: $($statusCounts[$status])`n" } - $testResultMarkdown += "* **Locations Scanned:**`n" + $testResultMarkdown += "* **Locations scanned:**`n" $testResultMarkdown += " * SharePoint: $(if ($hasSharePoint -gt 0) { 'Yes' } else { 'No' })`n" $testResultMarkdown += " * OneDrive: $(if ($hasOneDrive -gt 0) { 'Yes' } else { 'No' })`n" $testResultMarkdown += " * Exchange: $(if ($hasExchange -gt 0) { 'Yes' } else { 'No' })`n" - $testResultMarkdown += "* **Most Recent Scan Completion:** $(if ($mostRecentScan) { $mostRecentScan } else { 'No completed scans' })`n" + $testResultMarkdown += "* **Most recent scan completion:** $(if ($mostRecentScan) { $mostRecentScan } else { 'No completed scans' })`n" } else { - $testResultMarkdown += "* **Total On-Demand Scans Configured:** 0`n" + $testResultMarkdown += "* **Total on-demand scans configured:** 0`n" $testResultMarkdown += "* **Status:** No scans are configured`n" } - $testResultMarkdown += "`n[Manage On-Demand Scans in Microsoft Purview Portal](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans)`n" + $testResultMarkdown += "`n[Microsoft Purview Portal > Information Protection > Classifiers > On-demand classification](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans)`n" + $testResultMarkdown += "or" + $testResultMarkdown += "`n[ Microsoft Purview Portal > Data Loss Prevention > Classifiers > On-demand classification ](https://purview.microsoft.com/datalossprevention/dataclassification/colddatascans)`n" + } #endregion Report Generation $params = @{ - TestId = '35022' - Title = 'On-Demand Scans Configured for Sensitive Information Discovery' - Status = $passed - Result = $testResultMarkdown + TestId = '35022' + Title = 'On-Demand Scans Configured for Sensitive Information Discovery' + Status = $passed + Result = $testResultMarkdown + } + if ($null -ne $customStatus) { + $params.CustomStatus = $customStatus } Add-ZtTestResultDetail @params } From 9fe6d2a4fb2b27f4a9c8a13d2cd6f169672120ac Mon Sep 17 00:00:00 2001 From: Kshitiz sharma Date: Wed, 4 Feb 2026 20:31:00 +0530 Subject: [PATCH 8/9] Finetuned code --- src/powershell/tests/Test-Assessment.35022.ps1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/powershell/tests/Test-Assessment.35022.ps1 b/src/powershell/tests/Test-Assessment.35022.ps1 index 9d72a1fe4..2912e110d 100644 --- a/src/powershell/tests/Test-Assessment.35022.ps1 +++ b/src/powershell/tests/Test-Assessment.35022.ps1 @@ -266,7 +266,7 @@ function Test-Assessment-35022 { $testResultMarkdown += "### On-Demand scan configuration summary`n`n" if ($scanCount -gt 0 -and $tableData) { - $testResultMarkdown += "**Scan Details:**`n`n" + $testResultMarkdown += "**Scan details:**`n`n" $testResultMarkdown += "| Name | Sensitive information scan status | Workload | Sensitive information types detected | When created UTC | Last scan start time|`n" $testResultMarkdown += "|------|--------|----------|--------------|---------------|-----------------|`n" From 76ae41642595eacb8dd827786d852a55947d6345 Mon Sep 17 00:00:00 2001 From: Kshitiz sharma Date: Wed, 4 Feb 2026 20:31:47 +0530 Subject: [PATCH 9/9] removal of unwanted file --- src/powershell/tests/35022.md | 289 ---------------------------------- 1 file changed, 289 deletions(-) delete mode 100644 src/powershell/tests/35022.md diff --git a/src/powershell/tests/35022.md b/src/powershell/tests/35022.md deleted file mode 100644 index 04d6269d9..000000000 --- a/src/powershell/tests/35022.md +++ /dev/null @@ -1,289 +0,0 @@ ---- -author.spec: tygrady -author.doc: -author.dev: kshitiz-prog ---- - -# On-Demand Scans Configured for Sensitive Information Discovery - -## Spec Status - -Completed - -## Documentation Status - -Not started - -## Dev Status - -In progress - -## Minimum License - -Microsoft 365 E5 - -## Supported Clouds - -Global - -## Pillar - -Data - -## SFI Pillar - -Protect tenants and production systems - -## Category - -Information Protection - -## Risk Level - -Medium - -## User Impact - -Low - -## Implementation Cost - -Medium - -## Customer Facing Explanation - -Organizations with large volumes of historical content in SharePoint, OneDrive, and Exchange that predates auto-labeling policy implementation lack visibility into the extent of unclassified sensitive data across their tenants. Auto-labeling policies only classify new and modified content going forward; existing files and emails remain unclassified and invisible to data loss prevention policies that depend on label detection. Without on-demand scans, organizations cannot perform a baseline assessment of sensitive information already present in their environments, making it impossible to quantify compliance risk, plan remediation, or validate that DLP controls are effectively protecting all sensitive data. On-demand scans allow organizations to manually trigger sensitive information type detection across specified SharePoint sites, OneDrive accounts, and Exchange mailboxes, identifying where sensitive data exists and enabling targeted classification through retroactive labeling. Configuring at least one on-demand scan enables organizations to discover and classify historical sensitive data, providing a comprehensive view of their information protection posture beyond the forward-looking coverage of auto-labeling policies and creating a complete baseline for compliance and risk management. - -## Query Prerequisites - -**Required PowerShell Modules:** -- ExchangeOnlineManagement v3.5.1+ - -**Required Permissions:** -- Compliance Administrator, Compliance Data Administrator, or Security Administrator role -- Organization Management role (Exchange Online) - -**Connection Requirements:** -- Connection to Security & Compliance Center PowerShell: `Connect-IPPSSession` -- Requires internet connectivity to Microsoft 365 services - -**Notes:** -- On-demand scans are distinct from auto-labeling policies; they are manual discovery operations that scan historical content -- Scans can target SharePoint sites, OneDrive accounts, and Exchange mailboxes -- Multiple scans can be configured, each with different scope and sensitive information type targets -- Scans can be run on a scheduled basis (recurring) or as one-time operations -- Scan results are available in the Microsoft Purview Portal and can be used to inform labeling and remediation strategies -- Scans do not automatically apply labels; they identify where sensitive data exists for manual or policy-based remediation -- Large tenant scans can be resource-intensive and may take days or weeks to complete depending on data volume - -## Check Query - -* Query 1: Q1: Get all on-demand scans configured -`Get-SensitiveInformationScan` - -Documentation: [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) - -The cmdlet returns all sensitive information scans configured in the tenant, including those that are scheduled, in progress, completed, or waiting to run. This provides a complete view of scan configurations and their status. - ---- - -## Technical Note: Dynamic SIT Name Translation - -When `ItemStatistics` returns SIT GUIDs, the assessment tool must translate these GUIDs to friendly names for user-facing output. Rather than maintaining a static hardcoded mapping, this spec uses **dynamic discovery** via `Get-DlpSensitiveInformationType`: - -```powershell -# Build SIT GUID-to-name mapping dynamically -$sitCatalog = Get-DlpSensitiveInformationType -$sitGuidMap = @{} -foreach ($sit in $sitCatalog) { - $sitGuidMap[$sit.Identity.ToString()] = $sit.Name -} - -# Translate GUIDs from ItemStatistics to friendly names -foreach ($guid in $itemStatistics.SIT.PSObject.Properties.Name) { - $sitName = $sitGuidMap[$guid] - $matchCount = $itemStatistics.SIT.$guid - Write-Output "$sitName: $matchCount matches" -} -``` - -**Benefits:** -- **Zero Maintenance:** Automatically reflects new SITs when Microsoft releases them -- **Resilient:** Uses official SIT catalog as source of truth -- **Accurate:** Includes custom SITs created in the tenant, not just Microsoft-provided defaults -- **Complete:** Handles all SIT types without manual list updates - -This approach eliminates the maintenance burden of updating a hardcoded GUID map whenever Microsoft adds new SITs to the catalog. - -* Query 2: Q2: Examine on-demand scan configuration details -For each scan returned from Q1, examine the following properties: - - `Name` - Display name of the scan - - `SensitiveInformationScanStatus` - Current status (NotStarted, ImpactAssessmentInProgress, InProgress, Completed, CompletedWithErrors, Paused, Failed) - - `Workload` - Which workloads/locations the scan targets (SharePoint, OneDrive, Exchange) - - `WhenCreatedUTC` - When the scan was created - - `LastScanStartTime` - When the scan last executed (if applicable) - -If at least one scan exists in the tenant, on-demand scanning is configured. Note: Scan status may vary; a configured scan can be in any status (not yet run, in progress, completed, or failed). - -* Query 3: Q3: Get details on specific on-demand scan -`Get-SensitiveInformationScan -Identity "" | Select-Object -Property Name, SensitiveInformationScanStatus, Workload, ItemStatistics, LastImpactAssessmentStartTime, LastScanStartTime` - -Documentation: [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) - -This query returns detailed information about a specific on-demand scan including its current status, target locations, and sensitive information types matched. Review `SensitiveInformationScanStatus` to understand the scan phase (ImpactAssessmentComplete = estimation done, InProgress = scan running, Completed = finished), `ItemStatistics` to see which SIT types detected content and match counts, and `LastScanStartTime` to confirm when enforcement scanning last performed. - ---- - -## SIT GUID Reference - -**ItemStatistics Property Format:** - -The `ItemStatistics` property returns a JSON object containing SIT GUIDs (not friendly names) with match counts: - -```json -{ - "SIT": { - "50842eb7-edc8-4019-85dd-5a5c1f2bb085": 2, - "a44669fe-0d48-453d-a9b1-2cc83f2cba77": 1, - "d3515d47-d117-4910-bc05-824519861cf2": 5 - } -} -``` - -**Parsing & Translation:** - -The assessment tool **must** translate these GUIDs to friendly SIT names for user-facing output. Use the following reference: - -**Common SIT GUIDs to Friendly Names (Core Financial & PII):** - -| GUID | Friendly Name | -|------|---------------| -| `50842eb7-edc8-4019-85dd-5a5c1f2bb085` | Credit Card Number | -| `a44669fe-0d48-453d-a9b1-2cc83f2cba77` | U.S. Social Security Number (SSN) | -| `ed36cf51-9d63-40f3-a9a6-5a865c418d21` | U.S. Bank Account Number | -| `48ee9090-3f74-4238-89c9-6c0a93767a8f` | SWIFT Code | -| `50f56e32-3a6f-459f-82e9-e2b27b96b430` | Drivers License Number (U.S.) | -| `65ce4b3d-79b3-46c0-ba9d-8226d98130c8` | IBAN (International Banking Account Number) | -| `3b35900d-fd2d-446b-b3ad-b4723419e2d5` | ABA Routing Number | -| `f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4` | Canada Social Insurance Number | -| `f87b75b6-570d-465d-a91a-f0d9b9e0b000` | U.K. National Insurance Number (NINO) | -| `b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f` | International Medical Record Number (MRN) | - -**Full SIT GUID Reference (For Documentation Only):** - -The table above provides a reference for common SIT GUIDs. **In implementation, use the dynamic catalog lookup shown in the Technical Note above** rather than maintaining a static mapping. - -For research or documentation purposes, Microsoft publishes all available SIT definitions: -- **Source:** [Sensitive Information Type Entity Definitions](https://learn.microsoft.com/en-us/purview/sit-sensitive-information-type-entity-definitions) -- **Format:** Each SIT definition page contains the Entity ID in XML (the GUID you need) -- **Example:** [Credit Card Number SIT Definition](https://learn.microsoft.com/en-us/purview/sit-defn-credit-card-number) shows Entity ID: `50842eb7-edc8-4019-85dd-5a5c1f2bb085` - -**Implementation Guidance:** - -1. Use `Get-DlpSensitiveInformationType` to build dynamic GUID-to-name catalog (see Technical Note) -2. Extract GUIDs from `ItemStatistics.SIT` object -3. Look up each GUID in the dynamic catalog built in step 1 -4. Replace GUID with friendly name in user-facing output -5. Display as: "[Friendly Name]: [Count] matches" (e.g., "Credit Card Number: 2 matches") -6. If a GUID cannot be mapped, display as "[Unknown SIT - GUID]: [Count]" to avoid silent failures -7. Note: Custom SITs created in the tenant will be included automatically in the dynamic catalog - -**Notes:** -- ItemStatistics is updated in real-time as the scan progresses -- GUIDs are consistent across all Microsoft 365 tenants and never change -- Custom sensitive information types have unique GUIDs; the dynamic catalog includes them automatically - ---- - -## User facing message - -Pass: At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information. -Fail: No on-demand scans are configured in the organization; historical sensitive data cannot be discovered. -Investigate: Unable to determine on-demand scan configuration due to permissions issues or query failure. - -## Test evaluation logic - -1. **Build SIT Catalog:** Execute `Get-DlpSensitiveInformationType` and build dynamic GUID-to-name mapping (see Technical Note above) -2. **Query Q1:** Execute `Get-SensitiveInformationScan` to get all on-demand scans -3. **Count Scans:** Count the number of scans returned -4. **Evaluate Pass/Fail:** - - If count ≥ 1, the test passes (at least one scan is configured) - - If count = 0, the test fails (no scans configured) - - If the query fails or cannot be executed due to permissions, mark as Investigate -5. **Translate GUIDs:** For each scan's `ItemStatistics`, translate SIT GUIDs to friendly names using the dynamic catalog built in step 1 -6. **Note:** The test passes regardless of scan status (NotStarted, InProgress, Completed, Failed) as long as a scan is configured - -## Test output data - -The test will output on-demand scan configuration statistics: - -**Exact Output Table Format:** -``` -Name | SensitiveInformationScanStatus | Workload | Sensitive Information Types Detected | WhenCreatedUTC | LastScanStartTime -------------------------------|--------------------------------|------------------|----------------------------------------|---------------------|------------------ -FinancialDocumentScan | ImpactAssessmentComplete | SharePoint | Credit Card Number: 2 matches | 2024-01-15 10:30:00 | (empty) -HistoricalEmailScan | InProgress | Exchange | U.S. SSN: 1 match | 2024-01-10 08:15:00 | 2024-01-24 09:00:00 -``` - -**Summary:** -* Total On-Demand Scans Configured: [count] -* Scans by Status: - - Completed: [count] - - In Progress: [count] - - Not Started: [count] - - Failed: [count] - - Other: [count] -* Locations Scanned: - - SharePoint: [Yes/No] - - OneDrive: [Yes/No] - - Exchange: [Yes/No] -* Sensitive Information Types Covered: [List from scan configurations] -* Most Recent Scan Completion: [date] -* Status: Pass/Fail/Investigate - -Link to portal: [Microsoft Purview Portal > Information Protection > Classifiers > On-demand classification](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans) or [Microsoft Purview Portal > Data Loss Prevention > Classifiers > On-demand classification](https://purview.microsoft.com/datalossprevention/dataclassification/colddatascans) - ---- - -## Check Results - -**Result Summary:** -- **Pass:** At least one on-demand scan is configured for sensitive information discovery. -- **Fail:** No on-demand scans are configured; historical sensitive data cannot be discovered or classified. - -**Expected Details:** -- Total on-demand scans configured: [count from Q1] -- Scan names: [list from Q1 Name property] -- Scan status breakdown: Count by SensitiveInformationScanStatus (ImpactAssessmentComplete, InProgress, NotStarted, Failed, other) -- Location coverage: Review Workload property - SharePoint [Yes/No], OneDrive [Yes/No], Exchange [Yes/No] -- Sensitive information types covered: [Friendly names with match counts, translated from ItemStatistics property GUIDs using dynamic catalog lookup] -- Most recent scan/assessment: [Max/latest date from LastImpactAssessmentStartTime or LastScanStartTime property] -- Historical data discovery: [Enabled if count ≥ 1, Disabled if count = 0] -- SIT Catalog Source: Results reflect current `Get-DlpSensitiveInformationType` catalog (includes Microsoft-provided and custom SITs) - -**Portal Link:** [On-demand scans in Microsoft Purview portal](https://purview.microsoft.com/informationprotection/dataclassification/colddatascans) - -**Remediation Steps:** -1. Navigate to Information Protection > Classifiers > On-demand classification in Microsoft Purview -2. Select "Create scan" to begin configuration -3. Choose target locations (SharePoint, OneDrive, Exchange) -4. Select sensitive information types to detect -5. Review Impact Assessment for estimated scope -6. Schedule or start the scan -7. Monitor progress and review results - -**Learn More:** [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) - ---- - -## Challenges - -- **Scan Duration and Resource Impact:** Comprehensive baseline scans on large deployments (millions of files in SharePoint, OneDrive, and Exchange) can take weeks or months to complete. Scans consume significant tenant resources (compute, storage I/O, network bandwidth) and may impact user productivity. Organizations must carefully plan scan timing and scope, often starting with high-priority sites or accounts rather than comprehensive scans, which limits initial visibility. - -- **Detection Accuracy and False Positives:** Sensitive information type detection depends on confidence level tuning—low-confidence settings generate excessive false positives (e.g., sequences matching credit card patterns but not actual numbers), while high-confidence settings may miss real sensitive data. Trainable classifiers add complexity, requiring high-quality seed data and iterative training to achieve accuracy. File type variability (documents, spreadsheets, PDFs, images, encrypted files) affects detection performance, and some SITs perform better in specific formats. - -- **Results Triage and Remediation:** Comprehensive scans of large environments generate thousands or millions of findings, creating an overwhelming data management challenge. Organizations need robust processes to prioritize findings by sensitivity level and location. Beyond triage, clear remediation policies are essential—without defined approaches (retroactive labeling, DLP policy enforcement, access restrictions), scan results may not be acted upon, leaving discovered sensitive data unprotected. - -- **Operational Constraints:** Organizations with litigation holds or strict data retention policies face constraints on scanning and retroactively labeling historical content, requiring legal and compliance review. Multi-tenant environments and federated identities add complexity—coordinating scans across tenants becomes operationally difficult, and fragmented discovery results reduce visibility. - -- **Scan Schedule Optimization:** Determining optimal scan frequency requires balancing discovery completeness with resource impact and organizational change patterns. Too-frequent scans risk performance issues; too-infrequent scans may miss new sensitive data. Finding the right cadence depends on data growth rates and risk tolerance.