PowerShell Join-Object 3


首先强调,Join-Object是一个自定义命令,不是Powershell内置命令

问题

怎样将两个对象列表合并成一个单独的列表?或者同样的问题,怎样将两个CSV文件合并成一个单独的文件。两个列表或者两个CSV文件能合并的前提是它们有一些共同的东西,例如共同的信仰。

为什么需要连接?

如何你已经对关系数据库的基本概念有所了解,可以跳过下面的这部分介绍,直接进入文章的结尾查看脚本文件。

我们假设你有一个公司,你想跟踪员工的出勤率,可能需要一个简单的员工个人信息表(Employee),例如:

Employee

ID Name 1 John 2 Mark 3 Hanna

 

ID作为员工编号,具有唯一性,不能重复。为什么需要员工编号,因为显示世界可能会出现两位员工的同名同姓。另外一张表格应当是考勤表(Entrance),包含了员工的编号和上班打卡时间如下。

Entrance

Employee  When
1 6/12/2012 08:05:01 AM
1 6/13/2012 07:59:12 AM
1 6/14/2012 07:49:10 AM
2 6/12/2012 10:33:00 AM
2 6/13/2012 10:15:00 AM
44 2/29/2012 01:00:00 AM

N

上面表格中的列”When”,包含了日期和时间,和.NET中与数据库中的DateTime对象一样。从这两张表格中,我们可以得到如下信息。

John(员工编号1)似乎上班比较勤快,一般早上8点左右就到了。

Mark(员工编号2)一般喜欢迟到,稍微晚一点才来上班。

员工编号为44的这条记录是什么情况?他是人是鬼,怎么半夜来公司上班。还有就是他怎么没有出现在员工信息表中,难道离职了?作为数据库的设计人员必须提前处理好,不允许有这样的数据存在。

接下来稍微讨论一下,为什么会有两张表。为什么不把员工信息和考勤表存储在一张表格中?假如这样存储了,可能在考勤表中会出现若干相同的员工信息,也就是被称之为的数据冗余。

 

不同类型的链接

  1. 内连接
    数据库查询:

    SELECT   Employee.Name,Entrance.[When]
    
    FROM    Employee
    INNER JOIN Entrance ON Employee.id = Entrance.EmployeeId

    结果

    Name			When
    John                	2012-06-12 08:05:01.000
    John                	2012-06-13 07:59:12.000
    John                	2012-06-14 07:49:10.000
    Mark                	2012-06-12 10:33:00.000
    Mark                	2012-06-13 10:15:00.000

     

  2. 左外连接
    数据库查询:

    select Employee.Name,Entrance.[When]
    from Employee
    left outer join entrance on 
    Entrance.id = employee.id

    结果

    Name			When
    John                	2012-06-12 08:05:01.000
    John                	2012-06-13 07:59:12.000
    John                	2012-06-14 07:49:10.000
    Mark                	2012-06-12 10:33:00.000
    Mark                	2012-06-13 10:15:00.000
    Hanna               	NULL

     

     

  3. 右外连接
    数据库查询:

    select Employee.Name,Entrance.[When]
    from Employee
    right outer join entrance on 
    Entrance.id = employee.id

    结果

    Name			When
    John                	2012-06-12 08:05:01.000
    John                	2012-06-13 07:59:12.000
    John                	2012-06-14 07:49:10.000
    Mark                	2012-06-12 10:33:00.000
    Mark                	2012-06-13 10:15:00.000
    NULL	2012-02-29 01:00:00.000

     

     

  4. 全连接
    数据库查询:

    select Employee.Name,Entrance.[When]
    from Employee
    full  join entrance on 
    Entrance.id = employee.id

    结果

    Name			When
    John                	2012-06-12 08:05:01.000
    John                	2012-06-13 07:59:12.000
    John                	2012-06-14 07:49:10.000
    Mark                	2012-06-12 10:33:00.000
    Mark                	2012-06-13 10:15:00.000
    Hanna               	NULL
    NULL	2012-02-29 01:00:00.000

在Windows PowerShell 中连接对象

在Windows PowerShell中并没有内置的命令处理对象连接,所以原作者才会自己写了一个名为Join-Object的命令来演示怎样对内存中的对象列表实现连接,当然这些处理的对象也可以来自其它地方,例如CSV文件中。

假如c:\temp\employee.csv文件内容为:

假如c:\temp\entrance.csv文件内容为:

分别将两个CSV文件导入到内存中保存为$employee 和$entrance ,并查看变量的内容:

PS D:\> $employee = Import-Csv c:\temp\employee.csv
PS D:\> $entrance = Import-Csv c:\temp\entrance.csv
PS D:\> $employee | ft -AutoSize

Id Name
-- ----
1  John
2  Mark
3  Hanna

PS D:\> PS D:\> $entrance |ft -AutoSize

EmployeeId When
---------- ----
1          6/12/2012 08:05:01 AM
1          6/13/2012 07:59:12 AM
1          6/14/2012 07:49:10 AM
2          6/12/2012 10:33:00 AM
2          6/13/2012 10:15:00 AM
44         2/29/2012 01:00:00 AM

之所以选择CSV格式的数据,主要是因为可以很直观的展示对象列表中得所有数据。 下面的4个例子演示如何了4中不同方式的连接。

PS D:\> Join-Object -Left $employee -Right $entrance -Where {$args[0].Id -eq $args[1].EmployeeId} -LeftProperties "Name" -RightProperties "When" -Type OnlyIfInBoth

Name When
---- ----
John 6/12/2012 08:05:01 AM
John 6/13/2012 07:59:12 AM
John 6/14/2012 07:49:10 AM
Mark 6/12/2012 10:33:00 AM
Mark 6/13/2012 10:15:00 AM

Join-Object的前两个参数分别指定要连接的两个对象列表。

“Where”参数是连接的条件,$args[0]代表左边列表中的对象,$args[1]代表右边列表中的对象。

参数RightProperties 和LeftProperties 分别代表了输出的新列表中属性的名称。

参数Type是一组枚举名称,包含4个名称AllInLeft, AllInRight, OnlyIfInBoth 和 AllInBoth 我认为这四个名称非常可以容忍非常容易的与SQL中的连接关联起来,但是还是要在下面的表格中给出对应关系。

Join-Object SQL
AllInLeft Left  Outer
AllIInRight Right Outer
OnlyIfInBoth Inner
AllInBoth Full Outer

这是一个左连接的例子

PS C:\temp> Join-Object -Left $employee -Right $entrance -Where {$args[0].Id -eq $args[1].EmployeeId} -LeftProperties "Name" -RightProperties "When" -Type AllInLeft

Name                         When                           
----                         ----                           
John                         6/12/2012 08:05:01 AM     
John                         6/13/2012 07:59:12 AM     
John                         6/14/2012 07:49:10 AM     
Mark                         6/12/2012 10:33:00 AM     
Mark                         6/13/2012 10:15:00 AM     
Hanna

这是一个右连接的例子

PS C:\temp> Join-Object -Left $employee -Right $entrance -Where {$args[0].Id -eq $args[1].EmployeeId} -LeftProperties "Name" -RightProperties "When" -Type AllInRight 

Name                         When                                                     
----                         ----                                                     
John                         6/12/2012 08:05:01 AM    
John                         6/13/2012 07:59:12 AM    
John                         6/14/2012 07:49:10 AM    
Mark                         6/12/2012 10:33:00 AM    
Mark                         6/13/2012 10:15:00 AM    
                             2/29/2012 01:00:00 AM

这是一个全连接的例子

PS C:\temp> Join-Object –Left $employee –Right $entrance –Where {$args[0].Id -eq $args[1].EmployeeId} –LeftProperties "Name" –RightProperties "When" -Type AllInBoth 

Name                        When                           
----                        ----                           
John                        6/12/2012 08:05:01 AM        
John                        6/13/2012 07:59:12 AM        
John                        6/14/2012 07:49:10 AM        
Mark                        6/12/2012 10:33:00 AM        
Mark                        6/13/2012 10:15:00 AM        
Hanna                                                      
                            2/29/2012 01:00:00 AM

连接是一个数据处理时一个很重要的工具,因为绝大多数情况下,多个表格连接在一起,才能反映用户的需求。因为Join-Object非常有用奥! 千呼万唤始出来,下面附上Join-Object命令的内部脚本实现。

Join-Object 脚本

function AddItemProperties($item, $properties, $output)
{
    if($item -ne $null)
    {
        foreach($property in $properties)
        {
            $propertyHash =$property -as [hashtable]
            if($propertyHash -ne $null)
            {
                $hashName=$propertyHash["name"] -as [string]
                if($hashName -eq $null)
                {
                    throw "there should be a string Name"  
                }

                $expression=$propertyHash["expression"] -as [scriptblock]
                if($expression -eq $null)
                {
                    throw "there should be a ScriptBlock Expression"  
                }

                $_=$item
                $expressionValue=& $expression

                $output | add-member -MemberType "NoteProperty" -Name $hashName -Value $expressionValue
            }
            else
            {
                # .psobject.Properties allows you to list the properties of any object, also known as "reflection"
                foreach($itemProperty in $item.psobject.Properties)
                {
                    if ($itemProperty.Name -like $property)
                    {
                        $output | add-member -MemberType "NoteProperty" -Name $itemProperty.Name -Value $itemProperty.Value
                    }
                }
            }
        }
    }
}

function WriteJoinObjectOutput($leftItem, $rightItem, $leftProperties, $rightProperties, $Type)
{
    $output = new-object psobject

    if($Type -eq "AllInRight")
    {
        # This mix of rightItem with LeftProperties and vice versa is due to
        # the switch of Left and Right arguments for AllInRight
        AddItemProperties $rightItem $leftProperties $output
        AddItemProperties $leftItem $rightProperties $output
    }
    else
    {
        AddItemProperties $leftItem $leftProperties $output
        AddItemProperties $rightItem $rightProperties $output
    }
    $output
}

<#
.Synopsis
   Joins two lists of objects
.DESCRIPTION
   Joins two lists of objects
.EXAMPLE
   Join-Object $a $b "Id" ("Name","Salary")
#>
function Join-Object
{
    [CmdletBinding()]
    [OutputType([int])]
    Param
    (
        # List to join with $Right
        [Parameter(Mandatory=$true,
                   Position=0)]
        [object[]]
        $Left,

        # List to join with $Left
        [Parameter(Mandatory=$true,
                   Position=1)]
        [object[]]
        $Right,

        # Condition in which an item in the left matches an item in the right
        # typically something like: {$args[0].Id -eq $args[1].Id}
        [Parameter(Mandatory=$true,
                   Position=2)]
        [scriptblock]
        $Where,

        # Properties from $Left we want in the output.
        # Each property can:
        # - Be a plain property name like "Name"
        # - Contain wildcards like "*"
        # - Be a hashtable like @{Name="Product Name";Expression={$_.Name}}. Name is the output property name
        #   and Expression is the property value. The same syntax is available in select-object and it is 
        #   important for join-object because joined lists could have a property with the same name
        [Parameter(Mandatory=$true,
                   Position=3)]
        [object[]]
        $LeftProperties,

        # Properties from $Right we want in the output.
        # Like LeftProperties, each can be a plain name, wildcard or hashtable. See the LeftProperties comments.
        [Parameter(Mandatory=$true,
                   Position=4)]
        [object[]]
        $RightProperties,

        # Type of join. 
        #   AllInLeft will have all elements from Left at least once in the output, and might appear more than once
        # if the where clause is true for more than one element in right, Left elements with matches in Right are 
        # preceded by elements with no matches. This is equivalent to an outer left join (or simply left join) 
        # SQL statement.
        #  AllInRight is similar to AllInLeft.
        #  OnlyIfInBoth will cause all elements from Left to be placed in the output, only if there is at least one
        # match in Right. This is equivalent to a SQL inner join (or simply join) statement.
        #  AllInBoth will have all entries in right and left in the output. Specifically, it will have all entries
        # in right with at least one match in left, followed by all entries in Right with no matches in left, 
        # followed by all entries in Left with no matches in Right.This is equivallent to a SQL full join.
        [Parameter(Mandatory=$false,
                   Position=5)]
        [ValidateSet("AllInLeft","OnlyIfInBoth","AllInBoth", "AllInRight")]
        [string]
        $Type="OnlyIfInBoth"
    )

    Begin
    {
        # a list of the matches in right for each object in left
        $leftMatchesInRight = new-object System.Collections.ArrayList

        # the count for all matches  
        $rightMatchesCount = New-Object "object[]" $Right.Count

        for($i=0;$i -lt $Right.Count;$i++)
        {
            $rightMatchesCount[$i]=0
        }
    }

    Process
    {
        if($Type -eq "AllInRight")
        {
            # for AllInRight we just switch Left and Right
            $aux = $Left
            $Left = $Right
            $Right = $aux
        }

        # go over items in $Left and produce the list of matches
        foreach($leftItem in $Left)
        {
            $leftItemMatchesInRight = new-object System.Collections.ArrayList
            $null = $leftMatchesInRight.Add($leftItemMatchesInRight)

            for($i=0; $i -lt $right.Count;$i++)
            {
                $rightItem=$right[$i]

                if($Type -eq "AllInRight")
                {
                    # For AllInRight, we want $args[0] to refer to the left and $args[1] to refer to right,
                    # but since we switched left and right, we have to switch the where arguments
                    $whereLeft = $rightItem
                    $whereRight = $leftItem
                }
                else
                {
                    $whereLeft = $leftItem
                    $whereRight = $rightItem
                }

                if(Invoke-Command -ScriptBlock $where -ArgumentList $whereLeft,$whereRight)
                {
                    $null = $leftItemMatchesInRight.Add($rightItem)
                    $rightMatchesCount[$i]++
                }

            }
        }

        # go over the list of matches and produce output
        for($i=0; $i -lt $left.Count;$i++)
        {
            $leftItemMatchesInRight=$leftMatchesInRight[$i]
            $leftItem=$left[$i]

            if($leftItemMatchesInRight.Count -eq 0)
            {
                if($Type -ne "OnlyIfInBoth")
                {
                    WriteJoinObjectOutput $leftItem  $null  $LeftProperties  $RightProperties $Type
                }

                continue
            }

            foreach($leftItemMatchInRight in $leftItemMatchesInRight)
            {
                WriteJoinObjectOutput $leftItem $leftItemMatchInRight  $LeftProperties  $RightProperties $Type
            }
        }
    }

    End
    {
        #produce final output for members of right with no matches for the AllInBoth option
        if($Type -eq "AllInBoth")
        {
            for($i=0; $i -lt $right.Count;$i++)
            {
                $rightMatchCount=$rightMatchesCount[$i]
                if($rightMatchCount -eq 0)
                {
                    $rightItem=$Right[$i]
                    WriteJoinObjectOutput $null $rightItem $LeftProperties $RightProperties $Type
                }
            }
        }
    }
}

原文地址:http://blogs.msdn.com/b/powershell/archive/2012/07/13/join-object.aspx
原作者:PowerShell Team

本文链接: https://www.pstips.net/powershell-join-object.html
请尊重原作者和编辑的辛勤劳动,欢迎转载,并注明出处!

关于 Mooser Lee

我是一个Powershell的爱好者,创建了PowerShell中文博客,热衷于Powershell技术的搜集和分享。本站部分内容来源于互联网,不足之处敬请谅解,并欢迎您批评指正。

发表评论

您的电子邮箱地址不会被公开。 必填项已用 * 标注

3 条评论 “PowerShell Join-Object

  • Travis

    PS C:Userstravxie> $employeeemployee id name ———– —- 1 John 2 mark 3 hanna PS C:Userstravxie> $entranceemployeeid when ———- —- 1 6/12/2012 08:05:01 AM 1 6/13/2012 07:59:12 AM 1 6/14/2012 07:49:10 AM 2 6/12/2012 10:33:00 AM 2 6/13/2012 10:15:00 AM 44 2/29/2012 01:00:00 AM PS C:Userstravxie> Join-Object -Left $employee -Right $entrance -Where {$args[0].Id -eq $args .EmployeeId} -LeftProperties “Name” -RightProperties “When” -Type OnlyIfInBothPS C:Userstravxie> $a=Join-Object -Left $employee -Right $entrance -Where {$args[0].Id -eq $args .EmployeeId} -LeftProperties “Name” -RightProperties “When” -Type OnlyIfInBothPS C:Userstravxie> $a不懂为何没有输出 没有报错的